Method for sharing configuration data for high logic density on chip

ABSTRACT

A system for reducing the number of programmable architecture elements in a look-up table required for implementing Boolean functions or operations that are identical or logically equivalent is provided. The system may include a single set of storage elements connected to the inputs of multiple decoders, and the storage elements may be concurrently accessed by the decoders to provide simultaneous multiple outputs thereto.

FIELD OF THE INVENTION

[0001] The present invention relates to the field of electronic circuits, and, more particularly, to a system for reducing the number of programmable architecture elements in a look-up table for implementing Boolean functions or operations. In other words, the invention relates to sharing configuration data for higher logic density on a chip.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to programmable logic devices, particularly field programmable gate arrays (FPGAs). Lookup tables are highly configurable combinatorial logic devices. Their programming flexibility makes them desirable for use as basic building blocks in programmable logic devices (PLDs). Referring to FIG. 1, a generic implementation of a look-up table (LUT) in accordance with the prior art is shown. Generally, a lookup table includes an array of programmable architecture elements, such as SRAM cells 2, for storing data bits which are effectively used as output signals. Each one of the data bits corresponds to a particular set of input signals 1 b.

[0003] A particular data bit 1 b is coupled to the look-up table output terminal 1 c by decoding multiplexer circuitry 1 that is controlled by the lookup table input signals. The inputs 1 b and outputs 1 c of the look-up table usually connect to the routing resources of the CPLD/FPGA. In many cases, the output 1 c is also coupled to a sequential element (flip-flop, latch, etc.). The inputs 1 b and outputs 1 c can, of course, be connected to other devices.

[0004] A generic look-up table is used to implement any function that can be bounded within the inputs and outputs of the LUT. Larger functions are broken down into smaller functions tailored to fit into the LUTs. The device routing resources connect these LUTs. Typically, in a circuit netlist mapped into LUTs, multiple LUTs are used to implement identical or similar functionality. An example of this would be an adder implementation in which the number of LUTs programmed to perform addition is directly proportional to the number of bits being added. It follows that a plurality of LUTs would be programmed with the same set of configuration bits. As will be discussed further below, numerous such scenarios exist.

SUMMARY OF THE INVENTION

[0005] An object of the invention is to provide a look-up table architecture that reduces the number of LUTs required to implement identical and/or logically equivalent functions.

[0006] The present invention therefore provides a system for reducing the number of programmable architecture elements in a look-up table that are required for implementing Boolean functions or operations that are identical or logically equivalent. More particularly, the system may include a single set of storage elements connected to inputs of multiple decoders, where the storage elements are concurrently accessed by the decoders to provide simultaneous multiple outputs.

[0007] A controlled inverter or XOR gate may also be inserted before the output of any of the decoders to increase the number of derivative functions. The controlled inverter may be either static or dynamic. Further, buffers may be introduced at the output of the storage elements of the look-up table for driving the inputs of the decoders when the number of decoders exceeds the driving capacity of the storage elements.

[0008] In addition, a programmable switch network may selectively connect the outputs of the storage elements of the look-up table to the inputs of the decoders. The connections may be made to any location in the device regardless of its distance from the storage elements of the look-up table. SRAM or other suitable memory architecture may be used as the storage elements of the look-up table.

[0009] A method aspect of the invention is for reducing the number of programmable architecture elements in a look-up table that are required for implementing Boolean functions or operations which are identical or logically equivalent. The method may include connecting a single set of storage elements to the inputs of multiple decoders, and providing concurrent access to the storage elements by the decoders to generate simultaneous multiple outputs.

[0010] The method may further include performing controlled inversion of the output of any of the decoders to increase the number of derivative functions, where the controlled inversion may be either static or dynamic. The method may also include buffering the output of the storage elements of the look-up table for driving the inputs of the decoders when the number of decoders exceeds the driving capacity of the storage elements. Further, the outputs of the storage elements may be programmably connected to the inputs of the decoders.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The invention will now be described with reference to the accompanying drawings, in which:

[0012]FIG. 1 is a schematic block diagram illustrating a prior art look-up table with one decoder;

[0013]FIG. 2 is a schematic block diagram of a look-up table according to the invention having multiple decoders;

[0014]FIG. 3 is a schematic block diagram of an alternate embodiment of the look-up table according to invention having a controllable inverter or XOR gate at the output of the decoders; and

[0015]FIG. 4 is a schematic block diagram of another embodiment of the present invention using a programmable switch network.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0016] Programmable look-up tables (LUTs) are the main logic resources of the CPLD/FPGA and are typically spread across the device. As noted above with respect to FIG. 1, in prior art architectures the basic LUT structure is coupled to a single decoding multiplexer. The LUT illustratively shown in FIG. 1 is a four input type having sixteen SRAM latches. Any logic function having up to four inputs and one output can be implemented in this LUT. During circuit implementations in the CPLD/FPGA, a large portion of the combinatorial logic is absorbed by the LUTs. The LUTs are appropriately connected through the device routing resources to match the mapped Boolean network.

[0017] Referring now to FIG. 2, in accordance with the invention a single or common set of SRAM latches 2 are connected to three decoding multiplexers 1. The sixteen inputs 1 a of the decoding multiplexers 1 are fed from the SRAM latches 2. The four select lines 1 b of each of the multiplexers are unique and unrelated. The same uniqueness holds true for multiplexer outputs 1 c. It implies that these three multiplexers 1 can independently function as three, four-input look-up tables with identical configuration bits that are made available to the multiplexer inputs 1 a via a latch data bus 4.

[0018] As will be appreciated by those skilled in the art, designs including repetitive LUT bit patterns can be readily accommodated in the latch-multiplexer arrangement illustrated in FIG. 2. Most common designs with these patterns are adders, subtractors, multipliers and counters. In a circuit netlist, a similarity in the terms of logical equivalence exists among the many Boolean functions mapped into LUTs. Two or more Boolean functions are said to be logically equivalent if they can be implemented in look-up tables with identical configuration bits. Considering a four-input, single-output look-up table of the preferred embodiment, examples of logical equivalent functions include:

[0019] 1. ABC+D, AB+C, ABC, A+B, AB;

[0020] 2. AB˜+CD, AB˜+C, A+BC, A˜+BC, AB˜, AB, A+B, A+B˜; and

[0021] 3. AB(C+D˜), A(B+C˜), ABC, ABC˜, AB+C˜, AB˜+C, A+B, A+B˜, AB, AB˜.

[0022] It should be noted that the functions will have the same set of configuration bits. The four-variable function is the parent function, and the rest are its derivatives or children. The children have lesser numbers of variables. Thus, some of the LUT pins will remain unused or will have to be pulled up/down while implementing the children functions. It might also be possible that the children functions have the same number of variables as their parent functions. Such children and many more functions can be derived from the parent function if one or more inputs to the multiplexer select lines are connected through controlled inverters. Still more are possible if the multiplexer's output is also invertible.

[0023] Turning now to FIG. 3, a controlled inverter/XOR gate 3 is illustratively shown before the output of the decoding multiplexer to increase the number of derivative functions. The inverter control 3 a may be either static or dynamic. If the sixteen latches of the preferred embodiment source several decoding multiplexers, buffers 4 a may optionally be inserted at the latch outputs or in between the latch data bus 4 to communicate the latch data to distant multiplexers.

[0024] As will be appreciated by those skilled in the art, the lines conveying latch signals to the decoding multiplexers need not be optimized for delays if the CPLD/FPGA is not intended to be used as an extremely rapid reconfigurable device. Thus, in nearly all cases the latch data bus 4, which is sixteen bits wide here, may be minimum-width with minimum spacing and can follow any circuitous or meandering path to the decoding multiplexers. Therefore, routing the latches to the decoding multiplexers is not a critical design issue and may be given least priority while designing the device.

[0025] Another alternate embodiment of the invention is illustrated in FIG. 4. Here, the latch data busses 4 extend across the device. Programmable switches 4 b are inserted to distribute and route the latch data buses 4 originating from various LUT SRAM latches 2. These latch data buses may be routed to nearby or remote regions of the device using a programmable bus routing network 4. The routing network 4 in no way introduces any extra delays in the final circuit implementation, as static logic data resides on these latch data lines. Thus, the latch data buses 4 can traverse long distances and use any number of programmable switches 4 b and 4 c without affecting circuit performance. As all the nets in a bus are being routed at a time, configuration latch count is also low.

[0026] It is, of course, possible to alter the multiplexer connectivity to the latch data bus and even include flexibility at the decoding multiplexer inputs 1 a (sixteen here). Inputs to the multiplexer 1 x delineate such possibilities. To include more flexibility, programmable switches 4 c are inserted to couple latch data bus 4 to the multiplexer inputs 1 a as desired.

[0027] The present invention thus provides a look-up table with a plurality of read ports capable of performing functions that are identical or logically equivalent with other variants. The area saved by using this approach is appreciable and without significant delay overhead. In conclusion, the concept of sharing SRAM latches of LUTs reduces the number of SRAM latches and improves chip density.

[0028] While exemplary embodiments of the present invention have been described above, it is possible to use various alternatives, modifications and equivalents thereof. For example, a memory architecture other than SRAM may be used as the configuration host. These embodiments and others that will be apparent to those of skill in the art in light of the above disclosure are intended to fall within the scope of the present invention. 

That which is claimed is:
 1. A system for reducing the number of programmable architecture elements in a look-up table required for implementing Boolean functions or operations that are identical or logically equivalent, comprising: a single set of storage elements of said look-up table connected to the inputs of multiple decoders; said storage elements being concurrently accessed by said decoders to provide simultaneous multiple outputs.
 2. A system as claimed in claim 1 wherein a controlled inverter or XOR gate is inserted before the output of any of said decoders in order to increase the number of derivative functions.
 3. A system as claimed in claim 2 wherein said controlled inverter has a static or dynamic control.
 4. A system as claimed in claim 1 wherein buffers are introduced at the output of said storage elements of said look-up table for driving the inputs of said decoders when the number of decoders exceeds the driving capacity of said storage elements.
 5. A system as claimed in claim 1 wherein a programmable switches network is provided selectively connecting the outputs of said storage elements of said look-up table to the inputs of said decoders.
 6. A system as claimed in claim 5 wherein said connections can be made to any location in the device regardless of its distance from said storage elements of said look-up table.
 7. A system as claimed in claim 1 wherein memory architecture other than SRAM is employed as said storage elements of said look-up table.
 8. A method for reducing the number of programmable architecture elements in a look-up table required for implementing Boolean functions or operations that are identical or logically equivalent, comprising: connecting a single set of storage elements to the inputs of multiple decoders; and providing concurrent accessing to said storage elements by said decoders to generate simultaneous multiple outputs.
 9. A method as claimed in claim 8 further comprising controlled inversion of the output of any of said decoders in order to increase the number of derivative functions.
 10. A method as claimed in claim 9 wherein said controlled inversion is static or dynamic.
 11. A method as claimed in claim 8 further comprising buffering of the output of said storage elements of said look-up table for driving the inputs of said decoders when the number of decoders exceeds the driving capacity of said storage elements.
 12. A method as claimed in claim 8 further including programmably connecting of the outputs of said storage elements to the inputs of said decoders. 